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Estimating operator norms using covering nets 


Fernando G.S.L. Brandao * Aram W. Harrow t 

Abstract 

We present several polynomial- and quasipolynomial-time approximation schemes for a large class of 
generalized operator norms. Special cases include the 2 — > q norm of matrices for q > 2, the support 
function of the set of separable quantum states, finding the least noisy output of entanglement-breaking 
quantum channels, and approximating the injective tensor norm for a map between two Banach spaces 
whose factorization norm through £" is bounded. 

These reproduce and in some cases improve upon the performance of previous algorithms by Brandao- 
Christandl-Yard IBCYllj and followup work, which were based on the Sum-of-Squares hierarchy and 
whose analysis used techniques from quantum information such as the monogamy principle of entangle¬ 
ment. Our algorithms, by contrast, are based on brute force enumeration over carefully chosen covering 
nets. These have the advantage of using less memory, having much simpler proofs and giving new ge¬ 
ometric insights into the problem. Net-based algorithms for similar problems were also presented by 
Shi-Wu ISW12| and Barak-Kelner-Steurer IBKS13I , but in each case with a run-time that is exponential 
in the rank of some matrix. We achieve polynomial or quasipolynomial runtimes by using the much 
smaller nets that exist in t\ spaces. This principle has been used in learning theory, where it is known 
as Maurey’s empirical method. 


1 Introduction 

Given a n x m matrix M , its operator norm is given by ||M|| = max ie c m ||A/cc11 2 /11^11 2 , with ||x ||2 = 
(y~b |:Ei | 2 )2 the Euclidean norm. The operator norm is also given by the square root of the largest eigenvalue 
of M'M and thus can be efficiently computed. There are numerous ways of generalizing the operator norm, 
e.g. by considering tensors instead of matrices, by changing the Euclidean norm to another norm, or by 
considering other vector spaces instead of C m . Although such generalizations are very useful in applications, 
they can be substantially harder to compute than the basic operator norm, and in many cases we still do 
not have a good grasp of the computational complexity of computing, or even only approximating, them. 
In some cases quasipolynomial algorithms are known, usually based on semidefinite programming (SDP) 
hierarchies, and in other cases quasipolynomial hardness results are known. These are partially overlapping 
so that some problems have sharp bounds on their complexity and for others there are exponential gaps 
between the best upper and lower bounds. As we will discuss below, the complexity of these problems is not 
only a basic question in the theory of algorithms, but also is closely related to the unique games conjecture 
and the power of multiprover quantum proof systems. 

In this paper we give new algorithms for several variants of the basic operator norm of interest in quantum 
information theory, theoretical computer science, and the theory of Banach spaces. Unlike most past work 
which was based on SDP hierarchies, our algorithms simply enumerate over a carefully chosen net of points. 
This yields run-times that often match the SDP hierarchies and sometimes improve upon them. Besides 
improved performance, our algorithms have the advantage of being based on simple geometric properties of 
spaces we are optimizing over, which may help explain which types of norms are amenable to quasipolynomial 
optimization. In particular we consider the following four optimization problems in this work: 

Optimization over Separable States: An important problem in quantum information theory is 
to optimize a linear function over the set of separable (i.e. non-entangled) states, defined as bipartite 

* (a) Quantum Architectures and Computation Group, Microsoft Research, Redmond, WA and (b) Department of Computer 
Science, University College London WC1E 6BT. email: fbrandao@microsoft.com 

t Center for Theoretical Physics, Massachusetts Institute of Technology, email: aram@mit.edu 


1 









density matrices that can be written as a convex combination of tensor product states. This problem is 
closely related to the task of determining if a given quantum state is entangled or not (called the quantum 
separability problem) and to the computation of several other quantities of interest in quantum information, 
including the optimal acceptance probability of quantum Merlin-Arthur games with unentangled proofs, 
optimal entanglement witnesses, mean-field ground-state energies, and measures of entanglement; see |11M 13] 
for a review of many of these connections. 

Given an operator M acting on the bipartite vector space C dl 8> C d2 the support function of M on the 
set of separable states is given by 


hs ep (d 1 ,d 2 )( M ) := max tr[M(a <8> /3)\, (1) 

aeT> dl ,/3eT>d 2 

with T>d the set of density matrices on C d (dx d positive semidefinite matrices of unit trace). Our goal is to 
approximate hs ep (d 1 ,d 2 )(M)- For M £ L{ C d "), define 

^Sep»(d)(M) = max tr[M(ai 8> • • • (8) a n )}. (2) 

a lt ...,a n e'D d 

The first result on the complexity of computing hg ep ( dl j[i2 ) was negative: Gurvits showed that the problem 
is NP-hard for sufficiently small additive error (inverse polynomial in did^) [Gur03j . Then IHM13] showed 
there is no exp(0(log 1 2 ~ n T) (diffe))) time algorithm even for a constant error additive approximation of the 
quantity, assuming the exponential time hypothesis (ETEQ. This left open the question whether there are 
quasipolynomial-time algorithms (i.e. of time exp(polylog(di, cfe)))- 

In IBCYllj it was shown that this is indeed the case at least for a class of linear functions: namely those 
corresponding to quantum measurements that can be implemented by local operations and one-directional 
classical communication (one-way LOCC or 1-LOCC). For this particular class of measurements the problem 
can be solved with error 5 in time exp (O ( S ~ 2 log(cii) log(d 2 ))) ■ The proof was based on showing that the 
hierarchy of semidefinite programs for the problem introduced in 2004 by Doherty, Parrilo and Spedalieri 
[DPS04] (which is an application of the more general Sum-of-Squares (SoS) hierarchy, also known as the 
Lasserre hierarchy, to the separability problem) converges quickly. The approach of |BCY11| was to use ideas 
from quantum information theory (monogamy of entanglement, entanglement measures, hypothesis testing, 
etc) to find good bounds on the quality of the SoS hierarchy. Since then several follow-up work gave different 
proofs of the result, but always using quantum information-theoretic ideas !B1113 il.W 1 1 BTTl lYanOBl . 

A corollary of [BCYllI and the other results on 1-LOCC M is that /iSep(di,d 2 )(-^0 can a l so be approxi¬ 
mated for a different class of operators M : those with small Hilbert-Schmidt norm ||M||hs := tr(MtM)2. 
Ref. |BaCYlll showed that also in this case there is a quasipolynomial-time algorithm for estimating Eq. 
Q. An interesting subsequent development was the work of Shi and Wu [SW121 (see also BKS13I L who 
gave a different algorithm for the problem based on enumerating over nets. It was left as an open question 
whether a similar approach could be given for the case of one-way LOCC measurements (which is more 
relevant both physically [^] and in terms of applications; see again [HM131 1. 

Estimating the Output Purity of Quantum Channels: Another important optimization problem 
in quantum information theory consists of determining how much noise a quantum channel introduces. A 
quantum channel models a general physical evolution and is given mathematically by a completely positive 
trace preserving map A : T>d 1 —> T>d 2 . One way to measure the level of noise of the channel is to compute 
the maximum over states of the output Schatten-a norm, for a given a > 1: 

l|A||i-t.a := max ||A(p)|| a , (3) 

P^Vd x 

1 The ETH is the conjecture that 3-SAT instances of length n require time to solve. This is a plausible conjecture for 

deterministic, randomized or quantum computation, and each version yields a corresponding lower bound on the complexity of 
estimating h ^ ep . 

2 The one-way LOCC norm gives the optimal distinguishably of two multipartite quantum states when only local measure¬ 
ments can be done, and the parties can coordinate by one-directional communication. See [MWW09] for a discussion of its 
power. 
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with \\Z\\ a = tr(|Z|“)«. The quantity ||A|| i_>. Q varies from one, for an ideal channel, to d 2 1+a for the 
depolarizing channel mapping all states to the maximally mixed state. This optimization problem has been 
extensively studied, in particular because for a ~ 1 it is related to the Holevo capacity of the channel, whose 
regularization gives the classical capacity of the channel (i.e. how many reliable bits can be transmitted per 
use of the channel). 

It was shown in (HM13j that, assuming ETH, there is no algorithm that runs in time exp(0(log 2 ~ f ^ 1 ' ( d ))) 
and can decide if HAHi^q, is one or smaller than S (for any fixed S > 0 and a > 1) for a general quantum 
channel A : On the algorithmic side, nothing better than exhaustive search over the input space 

(taking time exp(fl(di))) is known. 

An interesting subclass of quantum channels, lying somewhere between classical channels and fully quan¬ 
tum channels, are the so-called entanglement-breaking channels, which are the channels that cannot be used 
to distribute entanglement. Any entanglement-breaking quantum channel A can be written as |HSR04j : 

HP) : =^2^{XiP)Yi, (4) 


with Yi > 0, tr(Y)) = 1 quantum states and Xj_ > 0, and ]Tb Xi = I a quantum measurement. Because 
of their simpler form, one can expect that there are more efficient algorithms for computing the maximum 
output norm of entanglement-breaking channels. However until now no algorithm better than exhaustive 
search was known either (apart from the case a = oo where the Sum-of-Squares hierarchy can be used and 
analyzed using pBCYllj ). 

Computing p —>• q Norms: Given a d\ x <?2 matrix A we define its p —> q norm by 




max 

xec d z 


\\ A AU 

lkll P ’ 


INI P ~ 



i/p 


(5) 


Such norms have many different applications, such as in hypercontractive inequalities and determining 
if a graph is a small-set expander B BH + 12] . to oblivious routing [BV11| and robust optimization [Ste05| . 
However we do not have a complete understanding of the complexity of computing them. For 2 < q < p or 
q < p < 2, it is NP-hard to approximate them to any constant factor iBVllj . In the regime q > p (the one 
relevant for hypercontractivity and small-set expansion) the only known hardness result is that to obtain any 
(multiplicative) constant-factor approximation for the 2 —» 4 norm of a n x n matrix is as hard as solving 
3-SAT with 0(log 2 (?r)) variables | BBH + 12l. 

On the algorithmi c side, besides the 2 —> 2 and 2 —>• oo norms being exactly computable in poly¬ 
nomial time, Ref. [BBH + 12 showed that one can use the Sum-of-Squares hierarchy to compute in time 
exp(0(log 2 (n)e -2 )) a number X s.t. 


Mil 2—>4 ^ X < Mll2-4 + eMlls 
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( 6 ) 


Whether similar approximations can be obtained for 2 —> q norms for other values of q was left as an 
open problem. 

Computing the Operator Norm between Banach Spaces: These problems are all special cases of 
the following general question. Given a map T : A —¥ B between Banach spaces A, B, can we approximately 
compute the following operator norm? 

||T|U_ s :=supffl^ (7) 

112 * 11-4 


1.1 Summary of Results 

In this paper we give new algorithmic results for the four problems discussed above. They can be summarized 
as follows. 
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Separable-state optimization by covering nets: We give a different algorithm for optimizing linear 
functions over separable states (corresponding to one-way LOCC measurements) based on enumerating over 
covering nets (see Algorithm [lj. The complexity of the algorithm matches the time complexity of [BCYllj 
(see Theorem |2j) . The proof does not use information theory in any way, nor the SoS hierarchy. Instead 
the main technical tool is a matrix version of the Hoeffding bound (see Lemma [3]). It gives new geometric 
insight into the problem and gives arguably the simplest and most self-contained proof of the result to date. 
It also gives an explicit rounding (as does 1111 l.l but in contrast to [ECYlli ILW141 'IK 'i'll lYanOfij h 

For particular subclasses of one-way LOCC measurements our algorithm improves the run time of 
[BCYllj . One example is the case where Bob’s measurement outcomes are low rank, in which we find 

a poly(d 2 )di^ e ''-time algorithm. 

Generalization to arbitrary operator norms: Computing hgep is mathematically equivalent to 
computing the 1 —> oo norm of a quantum channel, or more precisely the S\ —> S^ norm where S a denotes 
the Schatten-a norm. This perspective will help us generalize the scope of our algorithm, to estimating the 
Si —> B norm for a general Banach space B. The analysis of this algorithm is based on tools from asymptotic 
geometric analysis, and we will see that its efficiency depends on properties of B known as the Rademacher 
type and the modulus of uniform smoothness. Besides generalizing the scope of the algorithm, this also gives 
more of a geometric explanation of its performance. We focus on two special cases of the problem: 

1. maximum output norm: A particular case of the generalization is the problem of computing the 
maximum output purity of a quantum entanglement-breaking channel (measured in the Schatten-a 

norms). We prove that for any a > 1 one can compute HAHi^q, in time poly(d 2 )d^ ' to within 
additive error e. (see Corollary [l6|). In contrast known hardness results [HM101IILVI LSI show that no 
such algorithm exists for general quantum channels (under the exponential time hypothesis). Previously 
the entanglement-breaking case was not known to be easier. 

2. matrix 2 —> q norms: As a second particular case of the general framework we extend the approx¬ 
imation of IBBH+121 to the 2 —> 4 norm, given in Eq. ([g]) , to the 2 —> q norms for all q > 2 (see 
Corollary [l7]). 

Operator norms between Banach spaces: This framework can be further generalized to estimating 
the operator norm of any linear map from A —> B for Banach spaces A, B. Here we have replaced Si 
with any finite-dimensional Banach space A whose norm can be computed efficiently. When applied to an 
operator A, the approximation error scales with the A —> Ff —> B factorization norm , which is the minimum 
of ||Ai||^_ > b||A 2 ||.a->.^ such that A = A 1 A 2 . Factorization norms have applications to communication 
complexity jLS07l ILS09j , Banach space theory [Pie07| , and machine learning 11.1 IS + 10| , and here we argue 
that they help explain what makes the class of 1-LOCC measurements uniquely tractable for algorithms. 
In Section [4] we describe an algorithm for this general norm estimation problem, which to our knowledge 
previously had no efficient algorithms. This problem equivalently can be viewed as computing the injective 
tensor norm of two Banach spaces. 

We remark that this generalization is not completely for free, so we cannot simply derive all our other 
algorithms from this final one. In the case where A = Sf (which corresponds to all of our specific appli¬ 
cations), we are able to easily sparsify the input; i.e. given X]"=i Ai 1 g> Bi , we can reduce n to be poly(d) 
without loss of generality. For general input spaces A we do not know if this is possible. Also, the case of 
/igep is much simpler, and so it may be helpful to read it first. 

1.2 Comparison with prior work 

As discussed in the introduction, previous algorithms for separable-state optimization (and as a corollary, 
the 2 —» 4 norm) have been obtained using SDP hierarchies. Our algorithms generally match or improve 
upon their parameters, but with the added requirement for the separable-state problem that the input be 
presented in a more structured form. 
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Several parallels between LP/SDP hierarchies and net-based algorithms have been developed for other 
problems. The first example of this was Ref. DKI.POOa which gave both types of algorithms for the problem 
of maximizing a polynomial over the simplex, improving on a result implicit in the 1980 proof of the finite 
de Finetti theorem by Diaconis and Freedman [DF80j . Besides the separable-state approximation problem 
that we study, hierarchies and nets have been found to have similar performance in finding approximate 
Nash equilibria [LMM031 !Harl5j and in estimating the value of free two-prover games [AIM141 111 1131 . The 
state-of-the-art run-time for solving Unique Games and Small Set Expansion have also been achieved using 
both hierarchies and covering-nets. These parallels are summarized in the table: 


Problem 

nets 

hierarchies/information theory 

max x£A „ p(x) 

|DKLP06a| 

[DF80. DKLP06aJ 

approximate Nash 

ILMMD31 ALSV13a 

(Harl5 

free games 

AIM14J 

BH13, Cor 4] 

unique games 

[absio 

[BRSll 

small-set expansion 

[absio 

BBH+121 §10] 

separable states 

ISW121 BKS13 , this work 

BaCYll BH131 BKS13, LWlH LS14| 


Table 1: We briefly describe these problems here. Full descriptions can be found in the references in the 
table. In max ie A n p(s), A n is the n-dimensional probability simplex and p(x ) is a low-degree polynomial. 
“Approximate Nash” refers to the problem of finding a pair of strategies in a two-player non-cooperative 
game for which no player can improve their welfare by more than e. “Free games” refers to two-prover one- 
round proof systems where the questions asked are independent; the computational problem is to estimate 
the largest possible acceptance probability. “Unique games” describes instead proof systems with “unique” 
constraints; i.e. for each question pair and each answer given by one of the provers, there is exactly one 
correct answer possible for the other prover. Small-set expansion asks, given a graph G and parameters 
e, 6 >0, whether all subsets with a 5 fraction of the vertices have a > 1 — e fraction of edges leaving the 
set or whether there exists one with a < e fraction of edges leaving the set. Finally “separable states” refers 
to estimating /ig ep („ n ) as we will discuss elsewhere in the paper. It can also be though of as estimating 
max|| a; || 2=1 } p(x) for some low-degree polynomial p(x). 


While this paper focuses on the particular problems where we can improve upon the state-of-the-art 
algorithms, we hope to be a step towards more generally understanding the connections between these two 
methods. In almost every case above, the best covering-net algorithms achieve nearly the same complexity 
as the best analyses of SDP hierarchies. There are a few exceptions. Ref. [BBH + 12] shows 0(1) rounds 
of the SoS hierarchy can certify a small value for the Khot-Vishnoi integrality gap instances of the unique 
games problem, but we do not know how to achieve something similar using nets. A more general example 
is in |BKS13j . which shows that the SoS hierarchy can approximate /ig ep (M) in quasipolynomial time when 
M is entrywise nonnegative. 

The closest related paper to this work is [SW121 by Shi and Wu (as well as Appendix A of BKS13| 1, 
which also used enumeration over £-nets to approximate /ig ep . Here we explain their results in our language. 

Shi and Wu [ SW12] have two algorithms: one when M has low Schmidt rank (i.e. factorizes as Si —► 
Rj Soo for small r) and one where M has low rank, which we can interpret as a UJ —> S^ xd2 factorization 
(here S^ xd2 refers to the space of d\ x ^-dimensional matrices with norm given by the largest singular 
value). These correspond to their Theorems 5 and 8 respectively. In both cases they construct £-nets for 
the ££ un it bull of size (here, (.2 could be replaced with any norm; see Lemma 9.5 of [LT9l j). In both 

cases, their results can be improved to yield multiplicative approximations, using ideas from (BKS131 . 

Appendix A of Barak, Kelner and Steurer IBKS13] considers fully symmetric 4-index tensors M £ (]R n )® 4 , 
so that when viewed as n 2 x n 2 matrices their rank and Schmidt rank are the same; call them r. Their 
algorithm is similar to that of [SW12I . although they observe additionally (using different terminology) that 
for any self-adjoint operator T : A* —> A (i.e. satisfying ( T(X),Y) = (X,T(Y))) the A* — > £2 —> A norm is 
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equal to the A* —> A norm. This means that constructing an e-net for B(£ 2 ) actually yields a multiplicative 
approximation of the A* —> A norm (here the Si —> Soo norm). 

Achieving a multiplicative approximation is stronger than what our algorithms achieve, but it is at the 
cost of a runtime that can be exponential in the input size even for a constant-factor approximation. By 
contrast, our algorithms yield nontrivial approximations in polynomial or quasipolynomial time. 

1.3 Notation 

Define the sets of d x d, real and complex semidefinite matrices by 5+,%+ respectively. For complex vec¬ 
tor spaces V, W, define £(V, W ) to be the set of linear operators from V to W, C(V) := £(V, V) and 
/ H(V),'H+(V) to be respectively the Hermitian and positive-semidefinite operators on V. 

For a > 1 define the £ a ,S a metrics on vectors and matrices respectively by \\x\\z a = (X),; |^i|“) 1// “ and 
\\X\\ 3a = (tr \X\ a ) 1 / a . Denote the corresponding normed spaces by Where it is clear from context 

we will refer to both norms by || • || a . We use || ■ || without subscript to denote the operator norm for matrices 
(i.e. ||AT|| = \\X\\ SaB ) and the Euclidean norm for vectors (i.e. ||a;|| = ||:r||£ 2 ). 

We use 0(f(x)) to mean 0(f(x) poly and say that /( x) is “quasipolynomial” in x if / < 

0 (exp(polylog(a:))). 

For a normed space V, define B(V) = {v € V : ||u|| < 1}. Two important special cases are the 
probability simplex A ra := B(£^) D R > 0 and the set of density matrices (also called “quantum states”) 
T>d := B(Sf) n U% = conv{wd : v £ B^)}- Here v* is the conjugate transpose of v. For k a positive 
integer, define also 

A„(fc) : ={^±^±^i 4 e[n]|cA„, ( 8 ) 

where et is the vector in R n with a 1 in position i and zeros elsewhere. For a convex set K define the 
support function hnix) := sup yeK (x,y). For matrices (,) refers to the Hilbert-Schmidt inner product 
(X,Y) := tr(Xty). 

Banach spaces are normed vector spaces with an additional condition (completeness, i.e. convergence of 
Cauchy sequences) that is relevant only in the infinite dimensional case. In this work we will consider only 
finite-dimensional Banach spaces. 


2 Warmup: algorithm for bipartite separability 

In this section we describe a simple version of our algorithm. It contains all the main ideas which we will 
later generalize. Let M = ® b:, where W £ S+ 1 , 1) € 5(f. 2 , . X.- t < /, and each Y.j < I. In quantum 

information language, M is a 1-LOCC measurement, meaning it can be implemented with local operations 
and one-way classical communication [^] In later sections we will see that M can also be interpreted in a 
(mathematically) more natural way as a bounded map from Si to Soq. The goal of our algorithm is to 
approximate hs ep (d 1 ,d 2 )(^)^ where we define the set of separable states as 

Sep(di, d 2 ) ■■= convja <g> /3 : a £ T> dl ,fi £ V d2 }. (9) 

There have been several recent proofs [BCYlll IBH13IILW14] , each based on quantum information the¬ 
ory, that SDP hierarchies can estimate hs ep ( dl ,d 2 )(M) to error e||M|| in time exp(0(log 2 (d)/e 2 )). Simi¬ 
lar techniques also appeared in IBKS131 ILS14J for different classes of operators M. The role of the 1- 
LOCC conditions in these proofs was typically not completely obvious, and indeed it entered the proofs of 
[BCYlll IBH131ILW14I in three different ways. We now give another interpretation of it that is arguably 
more geometrically natural. 

3 Conventionally these have Y . X- L = J, but our formulation is essentially equivalent. 
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Begin by observing that 


h'Sep(d 1 ,d 2 )(M) 


max tr [(a ®/3)M], 

aGC dl ,/3GCd 2 


max ) trfaAd tr 
«GB dl ,/3G-D d2 ^ 

1 ^ Z=1 

max \\p\\ Y . 
p£Sx 


In the last step we have defined 


S x ■= {p £ A n :3a £ T> dl , p t = tr[aA,] V* £ [n]}, 


and 



The basic algorithm is the following: 


( 10 ) 


( 11 ) 


Algorithm 1 (Basic algorithm for computing hg e p (M) for one-way LOCC M = ]Tb Xi ® Yi). 

Input: {*;}"=! C H d + \{Y^ =l C H d + *. 

Output: States a £ V dl and (3 £ Vd 2 - 

1. Enumerate over all p £ A „.(k), with k = 91n(d2)/<5 2 . 

(a) For each p, check (using Lemma [5|) whether there exists q £ S with \\p — q\\y < 6/2. 

(h) If so, compute ||(?||y. 

2. Let q be such that the ||g||>- is the maximum, and let a £ T> dl be the state for which qi = tr[A fa\. 

Output this a and f3 satisfying tr[/3 q(Yi] = || qtYi\\ . 

The main result of this section is: 

Theorem 2. Let M = X)™=i ® 6e such that JA A,; < /, Aj > 0, 0 < Iq < /. AZ^oni/imfT] runs in time 

poly(di, c? 2 , n ) exp (O ((A 2 log(n) log(d 2 ))) and outputs a £ T> dl and (3 £ T> d2 such that 

h Sep (M) > tr[M(a ® 0)] > h Sep (M) - 6 , (12) 

For n = poly(di,d 2 ) this is the same running time as found in iBCYllj (while for n <C poly(di,d 2 ) it is 
an improvement). Later in this section we will show how we can always modify the measurement to have 
n = poly(di, g^) only incurring in a small error. But before that, we now show that Theorem [2] follows easily 
from two simple lemmas. 

One of the lemmas is a consequence of the the well-known matrix Hoeffding bound. 

Lemma 3 (Matrix Hoeffding Bound |TrolO| h Suppose Z \,..., are independent random dx d Hermitian 
matrices satisfying E [Zi\ = 0 and \\Zi\\ < A. Then 


Pr 


1 k 


i—1 


> 6 


< d ■ e 


(13) 


This is a special case of Theorem 2.8 from ITrolOj l: 

Our first lemma shows that one can restrict the optimization to a net of size n°^ log ^ d2 ^ s 
Lemma 4. For any p £ A„ there exists q £ A n (k) with 


Wp-qWy < 


91n(d 2 ) 


(14) 
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Proof. Sample ii ,..., ik according to p and set q = (e^ +.. -+e lk )/k. Define Y : ^T=i Pi Y i an d = T—Tq . 

Observe that E [Zf\ = 0 and \\Zj\\ < 1. Then Lemma |3] implies that 


Wp-qWy = 


1 

k 



< 5. 


(15) 


with positive probability if k > 81n(d)/i5 2 . Setting <5 = y/9 In (d 2 )/k we find that there exists a choice of 
q £ A n (k) satisfying Eq. (14). □ 

The second lemma shows that one can decide efficiently if an element of the net is a valid solution. A 
similar result is in [SW12I . 


Lemma 5. Given p £ A n and e > 0, we can decide in time poly(di,d 2 ,n) whether the following set is 
nonempty 

Sx 0 {q : \\p - g||y < e). (16) 


Proof. Both are convex sets, defined by semidefinite constraints. So we can test for feasibility with a SDP 
of size poly(di, d 2 , n). Indeed this is manifest for Sx in Eq. (11), while {q : \\p — q\\y < e} can be written as 


U ■ \\p-q\\y < 4 = |(<?i, ■■■,q n )-qi> 9 , -el < Y^Pi Y i ~Yl qiYi - e/ | ' 


(17) 

□ 


We are ready to prove Theorem [2j 

Proof of Theorem [1| Whatever the output x is, x < hs ep (M)+5/2. On the other hand, let q = arg max ?e s ||g||y, 
so that ||g||y = hsepiM 1 ). By Lemma|4j there exists p £ A„(fc) with \\p — q\\y < 5/2. Thus our algorithm 
will output a value that is > hs ep (M) — 5. We conclude that the algorithm achieves an additive error of S 
in time poly(di, d 2 )n° (los(d2)/,52) . □ 

2.1 Sparsification 

We now consider the case where n poly(d 1 , d 2 ). It turns out that we can modify the algorithm such that 
its running time is polynomial in n by first sparsifying the number of local terms of the measurement. This 
results in the following theorem. 

Theorem 6. Let M = Y^i =i Xi ® Y >- be such that W < I> Xi > 0, 0 < 5^ < I. Algorithm^ runs in time 
poly(n) exp (O (<5 -2 logdi log(did 2 ))) and outputs a £ and ft £ T>d 2 such that 

hsep(M) > tr(M(a ® 0)) > h Sep (M) - 5 , (18) 

The key element of the theorem is the following Lemma. 

Lemma 7. Given a 1-LOCC measurement M = ^’ l =1 A'j £§) Y t and some e > 0 there exists a 1-LOCC 
measurement M' = 5/™—! X' <S> YJ with \\M — M'\\ < e and n' < poly(di, d 2 )/e 2 . If the decomposition of 
M is explicitly given then M' and its decomposition can be found in time po\y(di,d 2 ,n) using a randomized 
algorithm. 


The modified algorithm is the following: 











Algorithm 8 (Algorithm for computing hs e p (M) for one-way LOCC M = JT Xi (g> Y)). 

Input: {X i: }(' = 1 ; {^}r =1 . 

Output: States a £ T>d 1 and /3 £ £ , d 2 - 

1. f/se to replace M = Y7 =i ® Y) with M' = YH=\ ® Y/ satisfying \\M — M'\\ < 8/2. 

2. Run Algorithm 1 on M'. 

The proof of correctness is straightforward. 

Proof of Theorem [6| Whatever the output x is, x < hs ep {M’) < hs ep (M) + 5/2. On the other hand, let 
q = argmax ge s ||g||y, so that ||g||v = hsepiM 1 ). By Lemma|4j there exists p £ A n (k) with ||p — q\\y < 5/2. 
Thus our algorithm will output a value that is > h.s ep (M') — 8/2 > hs ep (M) — 5. We conclude that the 
algorithm achieves an additive error of 8 in time poly(n)(did 2) 0 ^ log ^ 2 ^‘ 5 ^ □ 

It remains only to prove Lemma[7] This requires a careful use of the matrix Hoeffding bound (Lemma ??). 
The details are in Appendix [A} 

2.2 Multipartite 

We now consider the generalization of the problem to the multipartite case. We consider measurements on 
a /-partite vector space C dl &...(§) C dl . Following Li and Smith (LS14 l|, we define the class of fully one-way 
LOCC measurements on C dl (§)...(§) C dl recursively as all measurements M = /A, Xi <S> Mi , where Xj £ 'H d f , 
Y/i Xi < I, M.j £ q{fY" dl , and each M t is a fully one-way LOCC measurement in C d2 g)... g C dl . 

Ref. ILS Ml recently strengthened the result of )BH13| (from parallel one-way LOCC to fully one-way 
LOCC measurement) and proved that the SoS hierarchy approximates 

hse P (di,...,di)( M ) := „ max tr[(c*i 0 ... g ai)M] (19) 

a 1 GT) dl ,...,aieT> dl 

to within additive error 5 in time exp(0(log 2 (c/)Z 3 /c) 2 )), with d := max, e m di. Here we show that our previous 
algorithm for the bipartite case can be extended to the multipartite setting to give the same run time. 

Theorem 9. Algorithm \l(\ above runs in time exp(0 (/ 3 ln 2 (c/)/5 2 ) and outputs states on, i £ [/], satisfying 
hsep(d u ...,di) (M) > tr[M(ai g... g a;)] > h Se p{d 1 ,...,d l ) (M) - 8. (20) 
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Algorithm 10 (Algorithm for computing hs e p{d 1 ,....di) (Af) for fully one-way LOCC M). 
Input: {X^_ : m £ [l\,ii £ [ni ],... ,i m £ [n m ]} C H d f n such that 


M = 


n i 

E 

ii=i 


X, 


( 1 ) 


£ -ri 2 , 1 


*2 = 1 


E y(m) 

t m =1 


Output: States oti £ T> ( i,, i £ [/]. 


1. t/se Lemma^ to replace M = ^^=1 E^giA/^ with M' = Y^i^=i{X^y®M[ i satisfying \\M—M'\\ < 

S/21. Here M ^ is a shorthand for the collection {X^ } for m > 2 and likewise for M[. Redefine 

M , { im }, { Hi} appropriately. 

2. Initialize the variables a±,... , a; to 0. 

3. Enumerate over all p £ A n (k), with k = 9 1 2 ln(d)/<5 2 . For each p, 


(a) Check (using Lemma [5|) whether there exists q £ S X (i) with || Y2i(Pi ~ <?i)Afj|| £ S/21. 

(b) If no such q exists then do not evaluate this value of p any further. Otherwise let j3\ be the 
density matrix found in the SDP in LemmaUx satisfying qi = tr[/3iX[ 1 ' 1 ]. 


(c) For m! £ {2,.. £ [n 2 ],..., W € [n m /], 


define X*>™ 


■■=E n <InX 


C m‘) 

i 1 ,^ 2 ,.* 


(d) Recursively call 


Algorithm 7o| on input {X^ 1 \ ,}. 


Denote the output by fi 2 ,..., fii. 


(e) Iftr[M(Pi (g> ■ ■ • (g) /?;)] > tr[M(ai (g) • ■ ■ (g> a;)] t/ien replace a i,... ,cq with /3i,. . 


2.3 The need for an explicit decomposition 

The input to our algorithm is not only a 1-LOCC measurement M but an explicit decomposition of the form 
M = Y/i X, (g> Yi with each X t > 0. Previous algorithms for hs ep were mostly based on the SoS hierarchy 
(or its restriction to the separability problem also known as /c-extendible hierarchy) [DPS04j . Running these 
requires only knowledge of M and not its decomposition. The decomposition appears in the analysis of 
[BCY111ILW141IBC111 IBH131lYan06] . but not the algorithm. 

On the other hand, previous algorithms did not yield an explicit rounding, i.e. a separable state a with 
tr Mu k, hse P {M). The only exception to this [BH13I also required an explicit decomposition in order to 
produce a rounding. 

In general any bipartite measurement M can be written in the form Xj ® Y t . with individual terms 
that are not necessarily positive semidefinite. Finding some such decomposition is straightforward, e.g. using 
the operator Schmidt decomposition or even writing M = Ylijki ^ijki (*) 01 < 8 > |fc) {l |. Our algorithm can be 
readily modified to incorporate non-positive X, (along the lines of Section [4]), but the run-time will then 
include a factor of JT || JT*||! in the exponent. In general this will be 0(1) only if M is close to 1-LOCC and 
the decomposition is close to the correct one. 

This raises an interesting open question: given M, find a decomposition M = )£, A, (g) Y that (ap¬ 
proximately) minimizes ||A,;||i. We are not aware of nontrivial algorithms or hardness results for this 
problem. 
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3 Generalized algorithm for arbitrary norms 

An important step in the algorithm of the previous section was the identity, 


h‘Sep(d 1 ,d 2 ) {M) 


= max |b||r, 
pes x 


( 21 ) 


valid for any one-way LOCC M = ]>A Xj <g) Y.j. This equation suggests ways of generalizing the algorithm. 
In this section we consider the setting where the operators {Y -[.... Y n } belong to some Banach space B with 
norm || • ||g. In analogy with Eq. given Y — {Yi ,... Y n } we define the ( B , Y) norm in R" as 


\a\\B,Y ■= 


^ ' Cy 1 i 


i=l 


The goal is then to estimate 


max \\p\\b,y, 

p&Sx 


( 22 ) 


(23) 


where, as before, Sx is given by Eq. (11). 


Also this generalization is of interest in quantum information theory. As we discuss more in the next 
subsection, it includes as a particular case the well-studied problem of computing the maximum output 
a-norms of an entanglement-breaking channel. Consider a general entanglement-breaking quantum channel 
A : T>d 1 Vd 2 given by [ HSR04I : 


A(p) :=^tr(X l/ 0 )Y, (24) 

i 

with Yi > 0, tr(Yj) = 1, Xj > 0, and )Tb Xj = I. Then 

max ||A(p)|| q, = max |b||s„,y- (25) 

P&T’d 1 P&Sx 

In order to find an algorithm for computing Eq. (23), we need to replace the quantum Hoeffding bound 
(Lemma [3]) by more sophisticated concentration bounds. Since in Lemma [5] all we needed was a bound in 
expectation, the right concept will turn out to be the Rademacher type -7 constant of the space B, which — 
now define: 


we 


Definition 11. We say a Banach space B has Rademacher type -7 constant C if for every Z\..... £ B 

and Rademacher random variables E\(i.e. independent and uniformly distributed on ±1) , 

E 

si 

It is known that Schatten-a spaces with norm ||X|| a := tr(|X|“) 1/,a have type-2 constant \fa — 1 for 
a > 2 |BCL94| . and type-a constant 1 for every a £ [1, 2] Kl’TOO. Thm 3.3]. 

For a reader unfamiliar with the type -7 constant, we suggest verifying that the type-2 constant of (2 
is 1. A more nontrivial calculation is using the Hoeffding bound or its operator version to verify that the 
type-2 constant of or S ^ is 0( v / log n). (This also follows from the fact that the Sb, and S'i og ( n ) norms 
are within a constant multiple of each other on the space of n-dimensional matrices.) 

For sparsification (the analogue of Lemma [?| we will actually need a slightly stronger condition than a 
bound on the type -7 constant: 

Definition 12. The modulus of uniform smoothness of a Banach space B is defined to be the function 

{ \\x + t V \\ b + \\x - ry\\ B . \ 

Pe( T ) : = sup < - - -1 : ||x|| B = |b|| B = 11. (27) 


E 




k 

^ E 


< c 7 


(26) 
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By the triangle inequality, / 3 g(r) < r for all B. But when lim T _ >0 PB ^ T) = 0 then we say that B is 
uniformly smooth. For example, if B = I2 then pb{t) = r 2 /2, whereas p^ij) — r. More generally 1BCL94 ; 
(building on jTJ74| l proved that ps Q (r) < for a > 1. We say that B has modulus of smoothness of 

power type 7 if pb(. t ) < Cr 7 for some constant C. This implies (using an easy induction on k) that the 
type -7 constant is < C, and indeed this was how the type -7 constant was bounded in | TJ74| IBCL94] . 

The algorithm for approximating the optimization problem given by Eq. (23) is the following: 


Algorithm 13 (Algorithm for computing max pe g x ||p||b,v for B of type -7 constant C and modulus of 
uniform smoothness pb(. t ) < st 2 , with X := {X;} and Y := {Yi}). 

Input: {W}r =1 ,{K,KU 

Output: p £ S 


1. Use Lemma 22 to replace X = {X;} and Y = {X;} with X' := {X'} and Y' := {Y(}. 


'2C~! 11-^ in' V (T'—!) 


2. Enumerate over all p £ A n (k), with k = (-fe- max.; ||Y;||g) 

(a) For each p, check (using Lemma \2l\) whether there exists q £ S with ||p — glle.r < $■ 

(b) If so, compute ||p||y. 

3. Output p such that ||p||e,v is the maximum. 


We have: 

Theorem 14. Let B be a Banach space with norm || • ||g. Suppose the type-'y constant of B is C and that 
there is s > 0 such that the modulus of uniform smoothness satisfies ps (t) < st 2 . Suppose one can compute 
|| • ||b in time T. Consider {X ;}" =1 with X; d x d matrices satisfying X; > 0, X; < I, and {X ;}" =1 with 
Yi £ B. Algorithm \ 1 3| runs in time 


and outputs p such that 


poly(T, d, s) exp ^(c<5 1 max ||Y';|| B ) " 1 log(d)^ 


max ||p|| B ,v > \\p\\b,y > max ||p|| B ,y - 5, 
p€Sx peS x 


(28) 


(29) 


As an example, suppose B is Then the type-2 constant is 0(-\/log(d 2 )), max.; ||Y';|| < 1, and Theorem 
[ 2 ] shows one can compute max pg s x ||p||y in time exp(0((5 -2 log(di) log(did 2 ))). 

In the next subsection we discuss a few particular cases of the theorem worth emphasizing. Then we 
prove the theorem. 


3.1 Consequences of Theorem 14 


3.1.1 Restricted one-way LOCC measurements 


The next lemma shows that for subclasses of one-way LOCC measurements one has a PTAS for computing 
hsep■ The class include in particular one-way LOCC measurements in which Bob’s measurements are low 
rank. 


Corollary 15. Let M = ]TL X; ® Yi be such that X; > 0, ]>A X; < I and ||X ;|| 2 < r. Then one can compute 
a £ and (3 £ Vd 2 such that 


in time d 


0(S~ 2 r) 

1 


hse P (M) > tr(M(a®/3)) > h Sep {M) - 5 


(30) 
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Proof. We use Theorem 14 and Algorithm [13] to estimate the optimal p and then find a and /? by semidefinite 
programming. P : 


If instead we use the multipartite version of the algorithm (see Algorithm 10), we find that for M = 
Ei Xi 0 Y h 0 ... 0 Y it , with Xi > 0, J2i x i < 1 an d \\Yi\\i < r, we can compute a £ V d and ,..., /3; &V d 
such that 

hse P (M) > tr (Ma 0 ft 0... 0 /?;) > hs ep (M) - S (31) 

in time d°^ s 2/3r ). 


3.1.2 Maximum output norm of entanglement-breaking channels 

The next corollary shows that for all a > 1, there is a PTAS for computing the maximum output Scliatten-a 
norm of an entanglement-breaking channel. 

Corollary 16. Let A : T> dl —> T> do be an entanglement-breaking channel with decomposition A(p) := 
Ei tr {X iP )Yi (where X x > 0, E* Xi = I, Y t e V d J. 


1. For every a > 2 one can compute in time poly(d 2 )d 


o(s~ 2 a ) 


a number r such that 


max ||A(p)|| >r> max ||A(p)|| a - 5, 


(32) 


2. For every 1 < a < 2 one can compute in time poly(d 2 )d 


o((«a-“)- 1 ) 


a number r such that 


max ||A(p)|| > r > max ||A(p)|| a - <5, 

p€V dx pev dl 


(33) 


14 


and the fact that S a , with a 


Proof. Part 1 follows from Theorem 
|BCL94| and ps a ( T ) < 9l ^ t2 f° r ot > 1. Part 2, in turn, follows from Theorem 
with a > 2, has type-a constant one KPTOO Thm 3.3]. 


> 2 , has type -2 constant \/a — 1 


14 


and the fact that for S n 


□ 


We note that computing maximum output a-norms for general quantum channels is harder. In particular 
it was shown in IHMld ;, I HM13] that there is no algorithm that run in time exp (O (log 2-e d)) for any e > 0 
and can decide if max p ||A(p)|| is one or smaller than 6 (for any fixed <5 > 0) for a general quantum channel 
A : V d —► T> dl unless the exponential time hypothesis (ETH) is wrong (meaning there is a subexponential 
time algorithm for 3-SAT). 

The result of [HM10] is one example of many that found d°( lo 8 d ) upper or lower bounds for related 
optimization problems |LMM03l IBKW141 IHM10| . In a few cases |ALSV 13~bl ISW121 IDKLPOBb] poly-timc 
approximate schemes (PTASs) are known. Our results here fall into this second class. We hope that the 
geometric perspective from our paper can lead to a better understanding of what distinguishes t hese cases. 

What is known about hardness results for entanglement-breaking channels? Using the results of BBH +12] 
one can show that to determine if max p ||A(p)|| is > C/d or < c/d (for any two constants C > c > 0) cannot 
be done in time exp (O (log 2 e d)) assuming ETH. So one cannot hope to find a polynomial-time algorithm 
for a multiplicative approximation of the maximum output norm. 

Note that the complexity of the algorithm blows up when a —> 1. This is not only an artifact of the proof. 
Computing the quantity for a close to one allow us to estimate the von Neumann minimum output entropy 
of the channel. However to estimate it we need a number of samples of order 0(d) and so the net-based 
approach we explore in this paper does not lead to efficient algorithms. 
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3.1.3 Hypercontractive norms 

Our third corollary concerns the problem of computing hypercontractive norms, in particular computing the 
2 —► s norm of a d x d matrix A, for s > 2, defined as 


\\ A h->* ■= max ||Ar|| s 
* 2=1 


(34) 


This norms are important in several applications, e.g. bounding the mixing time of Markov chains and 
determining if a graph is a small-set expander [BBH + 12] , In [BBH + 12 it was also shown that to compute 
any constant-factor multiplicative approximation to the 2 —» 4 norm of a n x n matrix is as hard as solving 
3-SAT with 0(log 2 (n)) variables. In Appendix |b| we extend the approach of |BBH + 12j to show hardness 
results for multiplicatively approximating all 2 —> q norms, for even q > 4. 

In jBBH + 12] it was shown that the result of jBCYllj implies that for any d x d matrix A the Sum-of- 
Squares hierarchy computes in time ) an additive approximation x s.t. 


|A||^ 4 <*<||A||^ 4 + ,5||A|| 

2->2 Mil 


2 —>-oo 5 


(35) 


where ||A|| 2 _> 2 is the largest singular value of A and ||^4.|| 2 —>.oo the largest 2-norm of any row of A. 

Using Theorem [14] we can improve this algorithm in two ways: First we can compute an approximation 
to ||.|| 2 _>. a for any s > 2. Second the running time for fixed error is polynomial, instead of quasipolynomial. 

Corollary 17. For any s > 2 one can compute in time d°( sS ) a number x such that 

\\A\\U s >x>\\A\\ 

2—»s -suw 2—^2 ■ (36) 

Proof. Let Xj := At |i) (*| A/||A|||_ >2 . Note Xi > 0 and JA Xi < I. We can write 


\ a \\ S 2^s = max Af I*) ( i \ A W S/ 2 = \\ a \\2^2 m ax ||p||*/2- 

PtSx ' 


Since i s has type-2 constant \Js — 1, by Theorem 


14 


we can estimate 


max ||p|| s/2 = 
peS x 

in time exp (O ( s5~ 2 log(d))) with additive error S. 


WMUs 


(37) 


(38) 


□ 


Although the corollary above gives an approximation for every s > 2 that can be computed in polynomial 
time for every fixed error, it gives a worse approximation to the 2 —► 4 than BB H + 12] (given by Eq. (35)). 
We now show a second corollary that strictly improves the result of |BBH + 12] for 2 —► 4 and generalizes it 
to 2 —y s norms for every even > 4. 

Corollary 18. For any even s > 4 one can compute in time d°^ sS ) a number x such that 


||A||^ s >x>||A||^ s -5||A||L2PI|J 


Proof. Define 


Xi := 


Af |i) (i| A 


and 


Yi := 


11411-2 

Observe that Xj,Y) > 0, ]A A,; < I and < I. Additionally 


M f l*) M 
\ 1141 


2—>-oo 


II4I5-. = ||A||^ 2 ||A|| 5^ 2 ooW/2 (n) (E^® y ‘) 


(39) 


(40) 


(41) 


This last term can be approximated to additive error e in time 

exp(0(s 3 /£ 2 )) 

using the multipartite results of Section 3.1.1| 


□ 
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3.2 Proof of Theorem 1141 

The proof of Theorem [14] will follow from three lemmas, the first showing that it is enough to search over 
a net of small size, the second showing that one can decide membership of {q : \\p — q\\j3,Y < <5} efficiently 
(assuming that ||.||g can be computed efficiently), and the third giving a sparsification for the number of 
TO? and {Fi}? = i. 

We first show how the type -7 constant gives a concentration bound. This uses a standard argument. 

Lemma 19 (Symmetrization Lemma). Suppose we are given p £ A n , Zi, ..., Z n elements of a Banach space 
B with norm || ■ ||g, and £1 ,...,£& Rademacher distributed random variables. Then for every 7 > 1 


E 

il 




3 = 1 




< 2 E E 


h E”' z ' 


3=1 


Proof. 


E 


jB z ..-,f z ‘> 


j=i 




E 


< E 


iB z -, -, E J z *;l) 


J=l 


E 




1=1 


E EE 


L. E (^*3 ^*1 


< 2 E E 

il ,...,2fc~p<S>fc £l,...,£fc 


i=i 

7 




i=i 


(42) 

(43) 

(44) 

(45) 

(46) 

(47) 

□ 


Then we have the following generalization of Lemma [5] 

Lemma 20. Let f/ie Banach space B have type -7 constant C. Then for any p £ A„ </iere exists q £ with 


\\P 


i\\b,y < 


2 C 7 

AT - 1 


E IIYJ 


1/7 


(48) 


Proof. Sample ii,... ,ik according to p and set q 


( e ii 


■ ■ + e iit )/fc. Then Definition 11 and Lemma 20 
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give 


*1 


E \\p-q\\B,Y) < E Wp-qWby 

-f&t- / - -• _<g)fc ’ 


ll , — 


E 


1 fc 

, 51 Y h - E 

k i~P 


3 =1 


< 


< 


2 E E 

^1 vj£fe 


iE £ j y <j 


2C 7 v- Ml/ 

F^,5 P “ y «- 


i=i 

7 

•3 WB 


(49) 


The first inequality follows from the convexity of x H>■ x 7 , the second inequality from Lemma |19[ and the 
third from the fact that B has type-y constant C. □ 


The next lemma is an analogue of Lemma [5] 

Lemma 21. Let the Banach space B be such that ||.||g can be computed in time T. Given p £ A n and e > 0, 
we can decide in time poly(T, d , n) whether the following set is nonempty 


Sx G{q : \\p-q\\ BtY < £} (50) 

Proof. Since we have an efficient algorithm for || • ||g we can efficiently test membership in the set {q : 

| \p— q\\i3,Y < e}. Thus we can determine if Eq. ( |50| ) is nonempty using the ellipsoid algorithm |GLS93) . □ 

We now state an analogous sparsification result of Lemma [7] for the more general case we consider in this 
section. The proof is in Appendix [A] 

Lemma 22 . Suppose A is a map from d x d Hermitian matrices to a Banach space B and is given by 
A (p) = p)Yi where each Xi > 0, X)”=i ^ — I an d each ||T)||g < 1. Suppose that B has modulus 

of smoothness pb{ t ) < sr 2 . Then there exists A' such that A'(p) = JA_i {X' i ,p)Yf where each X[ > 0, 
Ei=i X i < I and each ||E/||g < 1. Additionally k < cd 2 (d + s)/5 2 for some constant c > 0, 

max ||(A'- A)(p)|| B < <5 (51) 

P&'Dd 


and A 1 can be found efficiently. 

With the lemmas in hand the proof of Theorem [14] follows along the same lines as Theorem [ 6 ] 


4 Algorithm for injective tensor norm 

In this section we present one further generalization, this time on the input space. While this final generaliza¬ 
tion does not have natural applications in quantum information (to our knowledge), it does give perspective 
on why it is natural to consider 1-LOCC measurements and entanglement-breaking channels. 

First, we introduce some more definitions. Suppose that || • m and || • ||g are two norms. For A an 
operator from A —> B define the operator norm 

||A|U_g := sup ||A(a)||g. (52) 

aeB(A) 
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Define the injective tensor norm A (S>inj B by 


(53) 


= SU P (a <8> b, x). 
a£B(A*) 
b£B(B*) 

Here A* is the space of functions from A to R, and ||:= sup agB (_ 4 ) a(a). For example, if A = B = 
then A ®inj B is the usual operator norm for matrices, i.e. largest singular value. More generally A* (g>i n j B 
is isomorphic to the operator norm on maps from A —> B. Finally if A, B, C are Banach spaces then define 
the factorization norm A —> B —> C for x £ C(A,C) by 

||A||„4-»c->b = inf || Ai ||c—>■» || A 2 1|^4—»-c- (54) 

A 1 e£(C,B) 

A 2 £C(A,C) 

A — A1A2 

We can now (informally) state our generalized estimation theorem. Given an operator A £ B(A t\ —>■ B) 
we can estimate ||A||_ 4 _>.g efficiently. 

For example, consider /ig ep , which we considered in Section [2j In our new notation 

h Sep (M) = ||M|| Soo0injSoo = ||M|k^ Soo , (55) 

where M is the map defined by M(X) = tr A [M(X ® I)). The requirement that M is 1-LOCC is roughly 
equivalent to the requirement that 

< 1- (56) 

Theorem 23. Suppose A,B are d-dimensional Banach spaces. Suppose HAH^-^-*.# < 1 and that a good 
factorization is known; i.e. x*,...,x„ £ A* and yi,...,y n £ B are given such that A = 
su Pae.4 Z)r=i l^iK 0 )! — f an d max * \\yi\\e < 1. Suppose further that algorithms exist for computing the 
A and B norms running in times T Al T B respectively. Let A denote the type -7 constant of B. Then we can 
estimate ||A||_ 4 _>.b to accuracy e in time 


T A T B poly(d)n c ^ s ^. 


(57) 


The algorithm follows similar lines to the earlier algorithms. It lacks only the sparsification step since 
we do not know how to extend Lemma [22] to this case. 


Algorithm 24 (Algorithm for computing |A||_ 4 ^g). 

Input: {*,;}”=!, {:%}”=! 

Output: p £ S 

1. Enumerate over all p £ N^, with k = {2X/6)T^ . 

(a) For each p, check whether there exists q £ S with ||p — g||g.y < 6. 

(h) If so, compute ||p||y. 

2. Output p such that ||p||b,f is the maximum. 

The proof of Theorem [23] is almost the same as that of Theorem [14] The only new ingredient is checking 
whether p £ Sx- This is equivalent to asking whether 3a £ B(A) such that p t = x*(a). This is a convex 
program which can be decided in time poly(d)T _4 using the ellipsoid algorithm along with our assumption 
that || • ||_4 can be computed in time T 4 . 
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A Sparsification 

In this appendix we prove two Lemmas about sparsification: one (Lemma [T| for the problem of hg ep and 
the second (Lemma |22| for the estimate Si —> B norms. While the former is a special case of the latter, it 
is also far more self-contained (requiring only the operator Hoeffding bound), so we recommend reading it 
first. 

Proof of Lemma [?| Assume initially that each ||Y)|| = 1. This is possible because we can always drop terms 
with Y t = 0 and then rewrite M as 

n 

(58) 

Redefining A/ appropriately we see that )T/ : < I still holds. 

Now write M = YH=iPiWi with pi = and Wi = Xi ®^ i . Sample ii,... ,i n < according to p and 

define 


n' 

j=i 


and 


B = H 


Xi/Pi 


i =i 


We would like to guarantee that 


|| A — M\\ < 6 
\\B-I dl \\<6 


for some S to be chosen later. We can use Lemma [3] here. To do so, note that 
|| Wj|| < tr Wi < tr M < did 2 

WXi/piW < < trM < did 2 , using the assumption that ||Vi|| = 1 

tr Yi 


Now we Hnd that the probability that Eq. (60) fails to hold is 

n'6 2 


- <ii ‘ i2exi>| -8jp|J +<iiexp ( _ ^Pi 


(59) 

(60a) 

(60b) 

(61a) 

(61b) 

(62) 


Taking n! = &d\d\ \og{2d\d2)/5 2 we have that (60) holds with positive probability. Fix the corresponding 


Choose M' = A/{ 1 + 6). Together with Eq. (60) this means that M' is a valid 1-LOCC 
m achieve our 

M' - M\\ = 


measurement. By Eq. (60a) we can achieve our result by choosing <5 = e/3. Indeed 

A 


< 


< 


1 + <5 
A 


1 + d 


1 - 


- M 

-A 
1 


1 + S 
2S + S 2 < e. 


+ \\A-M || 

(1 + 6 )+ 6 
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(63) 

□ 
























We now turn to the proof of Lemma 22 covering the case of general Banach spaces with bounded modulus 
of smoothness. 

We will need the following Azuma-type inequality from Naor INaol2] , who attributes it to Pisier. We 
will state a weaker Hoeffding-type formulation that suffices for our purposes. 


Lemma 25 (Theorem 1.5 of |Naol2| h Suppose Xi,... are independent random variables on B(B) for 
B a Banach space with Pb( t ) < sr 2 . Then 


Pr 




i=1 


> 6 


< e 


s-\-2—ck5 2 


(64) 


Proof of Lemma [H| First we introduce notation. For a matrix X , define the map X by X(A) := (X, A). 
ThusA = £? =1 YiXi. 

As in Lemma ?J we first drop terms with Y; = 0 and rewrite A = )0" =1 • ||Y)||gAj. Redefine X t . Y t 

appropriately anclassume from now on that each ||Y)||g = 1. 

Define pi = tr [Xf\/d. Note that p £ R" and ||p||i < 1. Sample i\,.. .,ik according to p and let them take 

value 0 with probability 1 — '52 i pi- Set A" = where Y- = Y i;i and A' = . (Set A' = Y- = 0 

if ij = 0.) These choices mean that E[A"] = A. 

Let X := Y^i=i Xi and observe that 0 < X < I. Additionally E[A'] = X/k. Thus if we define 
Zj := kXj — X then E [Zf\ = 0 and \\Zj\\ < d. The operator Hoeffding bound (Lemma [dj) implies that 
|| ^ Zj\\ < 8 with probability > 1 — dexp(—kS 2 /8d 2 ). When this occurs we have || J2j =i — -^11 — 8 

and thus 


3 = 1 


Next we attempt to bound the LHS of Eq. (51). First we can relax V^ to B(Si) and obtain 


max ||(A" - A)(p)||g < ||A" - A|| Bl _> B = ||A" - E[A"]|| Sl ^ B . 

p<£V d 


This formulation allows to apply the symmetrization trick (Lemma |19[) to obtain 


E max ||(A" - A)(p)||g < 2 E E 


l A x t . 

k Pi . 


(65) 


( 66 ) 


(67) 


Si->0 


We will bound this last quantity for any fixed ii,... ,ik- For p £ B(Si), define qj := (Xy , p)/kpi j . Denote 
the set of feasible q by S x -t where this notation emphasizes the dependence on both X and i \,..., iy.. Then 
E, \<h | < 1 , each \qj\ < d/k and 


IvE X t . 

, Vd/E 

k p pi 


= :A' 


= max 


Y. £ 3 E- q : j 

3 =1 


( 68 ) 


Si->0 


Now let us fix p (or equivalently q). Observe that ||Y).<i!j||g < d/k. Then Lemma 25 implies that 

> 5 


Pr 

Sl,...,Ek 


T £ 3 Yij Qj 
1=1 


, s+ 2 - 5 “ 

< e 


(69) 
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According to Lemma II.2 of |HLSW04] there exists a net of pure states p±,...,p m £ V d such that m < 
10 2d and for any pure state p, we have min; ||p — pi\\i < 1/2. Say that ei,... ,£k is a good sequence if 


£ jYij (Xij ,Pi)/kpi j || < 6 for all l £ [m\. By Eq. (69) and the union bound the probability that 


ex, ...,6k is bad (i.e. not good) is < 10 2d e s+2_cfe<5 ! d ~. For a bad sequence we still have that Eq. (68) is 
< d by the triangle inequality. For a good sequence, let a denote Eq. ( |68| and let /? be the corresponding 
maximum with p restricted to the set {p \,... ,p m }- By our assumption that the sequence is good we have 
P < 6. Observe that a = max^gg^i ||A(.(/?)||g and by convexity (and symmetry of the || • ||g norm) this 
max is achieved for p a pure state. Let pi satisfy \\p — pi\\i < 1/2. Then 


\\K(p)h < WKMh + WKip - pi)\\b + 


Maximizing the LHS over p we obtain a < ft + a/2, or equivalently a < 2/3 < 26. Thus Eq. (67) is 


< 4(5 + 2dl0 2 V +2 ~biW 


(70) 


(71) 


Redefining c, this is < 56 when k > cd 3 /6 2 . 


Since Eq. (67) controls the expectation with respect to i\,... ,ik, we conclude that for at least half of the 


i \,... ,ik, the LHS of Eq. (51) is < 10(5. Since Eq. (65) holds with high probability (> 1 — dexp(— cd/8)) 


it follows that there exists a sequence of i \,..., ik that simultaneously fulfills both criteria. Fix this choice. 
Finally we choose A' = A"/(1 + 6) so that the normalization condition on A/- is satisfied. This increases 
the error by at most a further factor of 6. We conclude the proof by redefining 6 to be 11(5. □ 


B Hardness of computing 2 —>■ q norms 


In this section we extend the hardness results of [BBH + 12] (Theorem 9.4, part 2) for estimating the 2 —► 4 
norm to general 2 —► q norms for even q > 4. 


The next lemma is an extension from Lemma 9.5 from BBH+12 


Lemma 26. Let M £ L{C d 0 C d ) satisfy 0 < M < I. Assume that either (case Y) hs ep (d,d)(M) = 1 or 
(case N) hg ep ^ dd ^(M) <1 — 6. Let k be a positive integer and q > 4 an even positive integer. Then there 
exists a matrix A of size d ikq x d 2kq such that in case Y, ||A|| 2 ->.q = 1, and in case N, ||A|| 2 ->. g < (1 — 5/2) k . 
Moreover, A can be constructed efficiently from M. 


Proof. Consider the following operator 


N (. Mj iBi 0 ...0 M^ /2Bii/2 )P Au ..,, Aq/2 0 P Bl ,...,B q/2 (M/ lBl 0--.0 M A q/ 2 B q/2 )> 


fl/2 


r l/2 


(72) 


with Pjx lt ..., Aq/2 the projector onto the symmetric subspace over A \,..., A q / 2 . We will first relate h Sepq / 2 ^ d 2 ' ) (N) 
t° ^Se P (d,d)(M), and then relate ^se P ‘j/ 2 (d 2 )(A) to ||A|| 2 _>. g for a matrix A of size d 4kq x d 2kq . 

First we show that in case Y, h Sepq / 2 ( d 2 )(N) = 1. Indeed since there are unit vectors x,y £ C d satisfying 
M A b {x 0 y) = x 0 y, we have 

^Sepx/ 2 (d 2 )(A) = max 2 (vx 0 ... 0 v q/2 )*N(vx 0 ... 0 v q/2 ) 

v 1 ,...,v q/ 2 ec d 

> (x® q/2 0y® q/2 )*N(x® q/2 0y® q/2 ) 

= (x® q / 2 0 y® q/2 )*P Au ..., Aq/2 0 P Bl ,..., Bq/2 {x m/2 0 y® q/2 ) = 1 

In case N we show that h Sepq / 2 ^ d 2 ' ) (N) < 1 — 6/2. Note that 


PA u ...,A q/2 < Pa,a 2 0 Ia 3 ...a 


q/ 2 ' 


(73) 
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Then 


h Sep°/ 2 (d 2 )( N ) = max («i0...® v q/2 )*N(v 1 ® ...®v q/2 ) 

vi,...,v q / 2 ec d2 

< max («i 0 v 2 )*(M\ / i 2 Bi 0 M][ 2 B2 )Pmm ® p b 1 b 2 0 ) (m 0t> 2 ) 

wi,V2GC d 


< 1-5/2, 


( 74 ) 


where the last inequality follows from Lemma 9.6 of |BBH + 12) . 

To construct a matrix A of size d 4kq x d 2kq s.t. ||A|| 2 _,. g = ^se P "/ 2 (d 2 )(-^ r ) we f°H° w the proof of Lemma 9.5 
of BBH+12. , the only difference being that we apply Wick’s theorem to PAi,...A q/2 , he. there is a measure 
p over unit vectors s.t. 

C + g/2 _1 ) / ^dv)(vv*r q/2 - (75) 


P A\,A q j 2 


□ 


The basic idea of the Lemma is to use the product test of [ HM10] to force iq,..., v q / 2 to be product 
states. Our proof can be summarized as saying that q/2 copies can enforce this more effectively than 2 copies 
(assuming q/2 > 2), and therefore we obtain soundness at least as sharp as in jBBH + 12) . This analysis may 
be wasteful, since using more copies should improve the effectiveness of the product test. 

The main result of this section is the following analogue of Theorem 9.4, part 2, of [BBH + 12] : 

Theorem 27. Let <f> be a 3 -SAT instance with n variables and 0(n) clauses and q > 4 an even integer. 
Determining whether is satisfiable can be reduced in polynomial time to determining whether ||A|| 2 _>. 9 > C 
or < c where 0 < c < C and A is an m x m matrix, where m = exp(< 7 V /npolylog(n) log(Cyc)). 

This gives nontrivial hardness for super-constant q , in fact up to 0(y /log d), but not yet all the way up 
to O(logd), where multiplicative approximations are known to be easy. 


Proof. Corollary 14 of [HM10] gives a reduction from determining satisfiability of <p to distinguishing between 
^Sep(d.d)(H7) = 1 and hs ep (d.,d){M) < 1/2, with 0 < M < I that can be constructed in time poly(d) from (j> 
with d = exp(y / npolylog(n)). Applying Lemma 26 gives the result. □ 
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